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(54) Method for automatic detection of human eyes in digital Images 

(57) A computer program product for locating first 
and second objects, each having substantially the same 
physical characteristics, and the ratio of the distance 
between the first and second objects and the size of 
each object is substantially Invariant, the computer pro- 
gram product comprises: a computer readable storage 
medium having a computer program stored thereon for 
performing the steps of determining potential flesh 
regions in an image; determining valley regions in an 
image; performing template matching for determining a 
plurality of locations that give a desirable match of the 
object relative to the template; and performing verifica- 
tion for determining the likelihood of pairs of potential 
eye candidates at the locations determined in the per- 
forming template matching step. 
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Description 
APPENDIX 

5 [0001] The disclosure in the appendix of this patent disclosure of this patent document contains material to which a 
claim of copyright protection is made. The copyright owner has no objection to the facsimile reproduction of any one of 
the patent documents or the patent disclosure, as it appears in the U.S. Patent and Trademark Office patent file or 
records, but reserves all other rights whatsoever. 

[0002] The invention relates generally to the field of digital image processing and, more particularly, to locating objects 
10 in a digital image. 

[0003] Identifying objects in an image is performed in a variety of image processing functions. For exanple, in cor- 
recting for red-eye in Images, the human eye is located and the undesirable red portion in the eye is replaced with a 
more aesthetically pleasing color. In the "KODAK" digital print station, the Image is displayed on a touch screen and one 
eye is repeatedly touched for further zooming in on the red portion of the eye upon each touch. The red portion of the 
IS eye is then identified by searching for red pixels in the area defined by the zooming process, and the identified red pixels 
are replaced with a predetermined color for making the image more aesthetically pleasant. The process is then 
repeated for the other eye. 

[0004] A neural networks method of locating human eyes is disclosed in Learning An Example Selection for Object 
and Pattern Recognition, The Al-Lab, MIT by K, K. Sung, November 1995. This method discloses training the a neural 

20 net to recognize eyes with acceptable distortion from a pre-selected eye template. The operator repeatedly distorts the 
original eye template and all variations produced from distorting the eye are labeled as either acceptable or unaccept- 
able. The distorted samples, that is, the training images, and the associated labeling information are fed to the neural 
net. This training process is repeated until the neural net has achieved satisfactory recognition performance for the 
training images. The trained neural net effectively has stored possible variations of the eye. Locating an eye is done by 

25 feeding a region In the image to the neural net for determining if a desired output, that is, a match, occurs; all matches 
are identified as an eye. 

[0005] Although the presently known and utilized methods of identifying eyes are satisfactory, they are not without 
drawbacks. The touch screen method requires constant human Interaction of repeatedly touching the touch screen for 
zooming in on the eye and, as a result, is somewhat labor intensive. Still further the neural net method requires exten- 
30 sive training and Is also computationally intensive in the matching process because an exhaustive search has to be per- 
formed tor all the possible sizes and orientations of the eye. 

[0006] Consequently, a need exists for improvements in the method of locating objects in an image so as to overcome 
the above-described drawbacks. 

[0007] The present invention is directed to overcoming one or more of the problems set forth above. Briefly summa- 
35 rized. according to one aspect of the present invention, the invention is directed to a computer program product for 
locating first and second objects, each having substantially the same physical characteristics, and the ratio of the dis- 
tance between the first and second objects and the size of each object is substantially invariant, the computer program 
product comprising: a computer readable storage medium having a computer program stored thereon for performing 
the steps of: (a) determining potential flesh regions in an image; (b) determining valley regions in an Image; (c) perform- 
40 ing template matching for determining a plurality of locations that give a desirable match of the object relative to the tem- 
plate; and (d) performing verification for determining the likelihood of pairs of potential eye candidates at the locations 
determined in step (c). 

[0008] It is an object of the present invention to provide a method of finding objects in an image which overcomes the 
above-described drawbacks. 

45 [0009] It is also an object of the present invention to provide a method of finding objects In an image in an automated 
manner. 

[0010] It is a further object of the present invention to provide a method of estimating the physical size of the objects 
to be found. 

[001 1] It is still a further object of the present invention to provide a method of estimating the physical orientation of 
so the objects to be found. 

[001 2] It is an advantage of the present invention to provide an efficient method of locating objects in an image. 
[001 3] It is a feature of the present invention to determine an estimated size of each object based on the shape and 
size of the region where the said objects potential reside. 

[001 4] It is a feature of the present invention to determine an estimated orientation of each object based on the shape 
55 and orientation of the region where the said objects potential reside. 

[0015] It is a feature of the present invention to determine a pair (or a group) of objects based on the a plurality of 
figures of merit determined based on prior knowledge about the relationship between the first and second objects. 
[0016] The above and other objects of the present invention will become more apparent when taken in conjunction 
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with the following description and drawings wherein identical reference numerals have been used, where possible, to 
designate identical elements that are common to the figures. 

Fig. 1 is a perspective view of a computer system for implementing the present invention; 
5 Fig. 2a is a diagram illustrating the searching procedure used by the present invention; 

Fig. 2b is a detailed diagram illustrating the zone-based cross*correlation process; 

Fig. 3 is a detailed diagram illustrating the flesh detection process; 

Fig. 4 is a detailed diagram illustrating the valley detection process; 

Fig. 5 is a view of the zone partition of the template by the present invention; 
10 Fig. 6 Is an illustration of the pairing of eye candidates; 

Fig. 7 is an illustration of the verification procedure for the distance between and orientations of the two eyes; 

Fig. 8 is an illustration of matching of the eye-to-eye profile; 

Fig. 9 is an illustration of the scoring function; and 

Fig. 10 is an illustration of the face-box and mouth-box. 

75 

[001 7] In the following description, the present invention will be described in the preferred embodiment as a software 
program. Those skilled in the art will readily recognize that the equivalent of such software may also be constructed in 
hardware. 

[0018] Still further, as used herein, computer readable storage medium may comprise, for example; magnetic storage 

20 media such as a magnetic disk (such as a floppy disk) or magnetic tape; optical storage media such as an optical disc, 
optical tape, or machine readable bar code; solid state electronic storage devices such as random access memory 
(RAM), or read only memory (ROM); or any other physical device or medium employed to store a computer program. 
[001 9] Referring to Fig. 1 , there is illustrated a computer system 10 for implementing the present invention. Although 
the computer system 10 is shown for the purpose of illustrating a preferred embodiment, the present invention is not 

25 limited to the computer system 10 shown, but may be used on any electronic processing system. The computer system 
10 includes a microprocessor based unit 20 for receiving and processing software programs and for performing other 
processing functions. A touch screen display 30 is electrically connected to the microprocessor based unit 20 for dis- 
playing user related information associated with the software, and for receiving user input via touching the screen. A 
keyboard 40 is also connected to the microprocessor based unit 20 for permitting a user to input information to the soft- 

30 ware. As an alternative to using the keyboard 40 for input, a mouse 50 may be used for moving a selector 52 on the 
display 30 and for selecting an item on which the selector 52 overlays, as is well known in the art. 
[0020] A compact disk-read only memory (CD-ROM) 55 is connected to the microprocessor based unit 20 for receiv- 
ing software programs and for providing a means of inputting the software programs and other information to the micro- 
processor based unit 20 via a compact disk 57. which typically includes a software program. In addition, a floppy disk 

35 61 may also include a software program, and is inserted into the microprocessor teased unit 20 for inputting the software 
program. Still further, the microprocessor based unit 20 may be programmed, as is welt know in the art. for storing the 
software program internally A printer 56 is connected to the microprocessor based unit 20 for prirtting a hardcopy of the 
output of the connputer system 10. 

[0021] Images may also be displayed on the display 30 via a personal computer card (PC card) 62 or, as it was for- 
40 merly known, a personal computer memory card international association card (PCMCIA card) which contains digitized 
images electronically embodied the card 62. The PC card 62 Is ultimately inserted into the microprocessor based unit 
20 for permitting visual display of the image on the display 30. 

[0022] Referring to Fig. 2a, there is illustrated a flowchart of a software program of the present invention. Before dis- 
cussing the details of the flowchart, it is instructive to note that, although a portion of the program includes detecting 

45 human flesh, any animal flesh may be detected provided the program is modified, as will be apparent to those skilled 
in the art. The program is initiated S2 and then detection of human flesh is performed to create a flesh map S4. 
[0023] Referring to Fig. 3, there is illustrated a detailed flowchart of creating the flesh map S4. In this regard, a color 
image is input S4a into the microprocessor-based unit 20 by any well known means, such as the PC card 62. and is 
converted S4b into a color space, preferably LST color space. The image code values are then quantized S4c to reduce 

so the total number of histogram bins. A three dimensional (3D) histogram is created S4d for the typically 3-channei color 
image. This 3D histogram is smoothed S4e to reduce the noise, and the peaks in the 3D histogram are then located 
S4f. Bin clustering is performed by assigning a peak to each bin of the histogram S4g. For each pixel in the color image, 
a value is assigned based on the bin that corresponds to the color of the pixel S4h. Connected regions that have at least 
a minimum number of pixels {MinNoPixels), preferably 50 pixels although other values may also be used, in them are 

55 labeled S4i. The maximum number of allowed number of regions is MaxNoRegions S4I. preferably 20 regions although 
other values may also be used. Based on the average transformed color component values of human flesh and the 
average color values of a given region, a flesh probability Pskin is calculated for each labeled region S4j. A unique label 
is assigned to a region with Pskin greater than SkinThreshold S4k, preferably .7 although other values may also be 
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used. All the non-flesh pixels are set to zero. If two or noore flesh regions touch, they are merged into a single flesh 
region S4I. 

[0024] Referring to Fig. 4, the program receives the flesh map data and performs a well known elliptical fitting tech- 
nique for each connected flesh region. The aspect ratio and compactness of the fitted ellipse are measured to reject 
5 flesh regions of Irregular shapes. The aspect ratio is the defined as the ratio of the long axis and short axis, and com- 
pactness is defined as ratio of the enclosed area by the fitted ellipse and the area of the entire flesh region. If the aspect 
ratio of the fitted ellipse is above three, it is rejected, or the compactness is below .9, it is rejected. Referring back to Fig. 
2a, the program then determines a program estimated size and orientation of the eyes S6 according to the ellipse fitted 
to each remaining flesh region using the following equation, which is graphically illustrated in Fig. 4: 

10 

s = b/4 

where b is the length of the minor axis of the fitted ellipse in pixels and s is the estimate size, or length, of the eye In 
pixels. 

15 [0025] An estimated angular orientation of the eye is also generated from the orientation of the fitted ellipse S6, as 
illustrated in Fig. 4. The assumption is that the two eyes are aligned and therefore the orientation of each eye is approx- 
imately the same as the orientation of the minor axis of the fitted ellipse. TTiis angle, denoted by 6, is between the minor 
axis and a horizontal line. A sub-image is extracted for each flesh region S8. It is instructive to note that, from this esti- 
mated eye size, the resolution of the extracted sub-image is changed so that the eyes in the image have approximately 

20 the same size as the eye template S8. As shown in Fig. 6, a particular eye template has a resolution of 19 pixels hori- 
zontally and 13 pixels vertically. This resolution change, or resizing, enables the eyes in the images to be matched at 
the same resolution of a template and against the same amount of structural detail, as will be described In detail herein 
below. An alternative Is to design a set of templates with different amounts of detail and keep the resolution of the image 
unchanged. Such an alternative design is readily accomplished by those skilled in the art 

25 [0026] Referring back to Fig. 2a, valley detection Is performed to create a valley map for each extracted sub-Image 
SI 0. The purpose of valley detection is to remove flat flesh regions from further consideration. Refen-ing to Fig. 5, valley 
detection consists of a plurality of steps. First, a smoothing operation is performed to reduce any noise and small spec- 
ular highlight in the eye or eyeglasses that may be present, preferably using a morphological opening followed by a mor- 
phological closing S10a. The valley regions In the smoothed image are identified, preferably as the output of the 

30 difference between a morphologically closed image and Itself SlOb. The code values of this output Image represent the 
confidence of the presence of a valley. A median filtering operation Is applied to the obtained valley image to reduce 
noise and long thin structures in the image SIOc. In general, eyes do not reside in any long thin structure In the valley 
Image. Also, the absolute code value in the valley image does not necessarily correspond to likelihood of an eye socket. 
However, local maximums in the processed valley image are emphasized, preferably using the difference between the 

35 valley image and Its morphologically closed version SI Od. The result of SlOd is combined with the result of texture sup- 
pression SlOe as well as other inhibitory mechanisms SlOf, for example, "redness" of the eye in the case of red-eye 
detection by a logic AND operation SlOg. 

[0027] After SI Og, pixels with code values greater than a predetermined threshold are set to one, and zero othenwise. 
A binary valley map is thus created S10h. A mask map for directing the subsequent searching Is created as the inter- 
40 section of the skin map and valley map SI 2. 

[0028] The cross-correlation between the template and the Image is computed by sequentially moving the center pixel 
of the template to each pixel in the searching mask map of each flesh region and performing a specific type of zone- 
based cross-correlation at each pixel location for determining the center pixel of the eye SI 4, as will be described in 
detail below. 

45 [0029] Referring briefly to Fig. 2b, a zone-based cross-correlation S14 is initialized 81 4a. A template is then retrieved 
and normalized Si 4b, if it is not already stored in a normalized state. Referring briefly to Fig. 6. the template is prefer- 
ably generated from sampling a plurality of eyes and relating their corresponding pixel values, for example by taking the 
average values at each pixel location. The template Is then partitioned into four sub-regions that represent the eyelid, 
iris, and the two corners of the eye. To normalize the template, the average pixel value for the entire template image is 

so subtracted from each pixel value and the resulting pixel value is divided by the standard deviation of the entire template 
image for obtaining a normalized pixel value. The resulting template therefore has a mean value of zero and a unit var- 
iance. 

[0030] More specifically, and referring back to Fig. 2b, with the center of the template at the pixel location of interest, 
the zone-based cross-correlation includes, first, extracting a block from the image with Its center at the current pixel and 
55 Its size/orientation the same as the template SI 4c. normalizing the extracted image block S14d. computing the cross- 
correlation between each sub-region of the extracted block and its counterpart in the template with the pixel of the 
Image at the center of the suthreglon S14e, hereinafter referred to as a zone-based correlation. If the cross-correlation 
for each sut)-zone meets or exceeds a predetermined threshold, preferably 0.5, cross-correlation is performed with the 
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entire template to the same image pixels of interest Sl4f, hereinafter referred to as a complete correlation. If a thresh- 
old, preferably 0.7. is again met, the program temporarily stores the correlation value and the size/orientation of the 
template in a buffer S14h. If the cross-correlation for one or more sub-zones falls the threshold or the cross-correlation 
for the entire template fails the threshold, the cross-correlation at the pixel of interest Is set to "0" and the associated 
size/orientation are set to "N/A" S14I. The program then continues to next pixel location in the search mask map S14I 
for repeating the above-described partitioned and complete correlation operations. If not the last pixel with nonzero 
mask value in the concerned flesh region. 

[0031] The above-described zone-based correlation and complete correlation is repeated by varying the template for 
a plurality of sizes around the estimate size (increasing and decreasing) and a plurality of orientations around the esti- 
mate orientation (clockwise and counter-clockwise rotation), in order to refine the size and orientation of the eye S14j. 
Such increasing and decreasing of the template size/orientation is readily accomplished by those skilled in the art. This 
refinement involves the same previously described steps, Sl4c-S14i. If one or more complete correlation scores at a 
pixel location of interest result in a value above the threshold, the program selects the highest correlation value in the 
temporary buffer and its corresponding template size/orientation used for obtaining the highest value and places them 
in memory S14k. It facilitates understarKfing to note that the above-described varying of the template size is for further 
refining the estimated size of the eye from Eq. 1 . and the size/orientation of the best-matching template variation In turn 
indicate the exact size/orientation of the actual eye. 

[0032] For example, the template size is increased by 10% and by 10%. If the highest correlation value is from the 19 
x 13 resolution template, the estimated size of the eye is not adjusted. If either of the other resolutions produce the high- 
est correlation value, the estimated size of the eye is adjusted so that it matches the template size producing the highest 
correlation score. Similarly, the template orientation Is increased by 10 degrees and decreased by 10 degrees. If one or 
more complete correlation scores at the pixel location of interest result in a value above the threshold, the program 
selects the highest correlation value in the temporary buffer and Its corresponding template orientation used for obtain- 
ing the highest value and places it in memory. If the highest correlation value is from the template at the original esti- 
mated orientation, the estimated orientation of the eye is not adjusted. If either of the other orientations produce the 
highest correlation value, the estimated orientation of the eye is adjusted so that it matches the template orientation pro- 
ducing the highest correlation value. 

[0033] As stated previously, the program then continues to next pixel location identified by the search mask for repeat- 
ing the above-described zone-based and conrplete correlation S14I after the size and orientation have been refined for 
the pixel of Interest SI 4k. 

[0034] The program continues on to verify the most likely candidates from the plurality of peak-correlation points in 
each window as the center pixel of the eye S16-S24. The peak points are located as the points having a local maximum 
complete correlation score SI 6. The locations of these peaks are stored in a buffer SI 8. Referring to Fig. 6, to verify, a 
plurality of verification steps are used. The steps Involve matching known characteristics about a pair of eyes to all com- 
binations of pixels selected during correlation, and a scoring technique Is used (figures-of- merit) to select the most 
likely pair of locations for the center of the eyes. 

[0035] Referring both to Figs. 2a and 6. the first step Is to form all combinations of pixels selected as likely candidates 
in the concerned flesh region S20. In other words, each peak pixel is paired with all the other peak pixels in the same 
flesh region. Referring to Fig. 7, the angular orientation is then determined - the angle between the line formed between 
the two pixels of interest and a horizontal line through one of the points, preferably the leftwardly pixel. If the angular 
orientation is not within ten degrees of the estimated angular orientation in S14c, the pair is eliminated as possible can- 
didates for the center of both eyes. If It is within five degrees of the estimated angular orientation, the pair is stored along 
with its particular score. 

[0036] Also referring to Fig. 7, the distance between the two candidate eyes is determined. If the distance is not pro- 
portional to the size of the eyes according to the knowledge of the human faces, the pair is eliminated as possible can- 
didates for the center of both eyes. If the proportion is within 20% of the normal proportion, the pair is stored along with 
its particular score. 

[0037] Referring to Fig. 8, the next step involves taking the pixels along a horizontal line through the two pixels in a 
possible combination. A graph of code values versus pixel location for each combination will have a shape as Illustrated 
In Fig. 8. If the shape deviates substantially, the pair is eliminated as possible candidates for the center of the eyes; if It 
does not substantially deviate, the pair is stored along with and its particular score. The deviation is preferably deter- 
mined by the ratio of the middle peak point and the average of the two valley points, although those skilled in the art can 
determine other suitable measure of the deviation. 

[0038] Referring to Fig. 10, all combinations are then examined for symmetry This includes taking the distance 
between all combinations and. at a distance halfway between them, looking for symmetry on both sides of the image 
through pixels vertically through this halfway point. The region of interest, which contains the face, preferably has a 
width of twice the distance between the eyes and a height of three times the distance between the eyes. The face region 
is divided into two halves - the left side and the right ride according to the positions of the eyes. The symmetry is pref- 
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erably determined by the correlation between the left side and the mirror image of the right side, although those skilled 
in the art can determine other suitable measure of the symmetry. If symmetry exists for the two sides, the pair and its 
particular score is again stored; if no symmetry exits, the pair is eliminated as a possible pair of candidates. 
[0039] All combinations are also examined for their centrality within the extracted sub-image or the fitted ellipse. A 

5 preferred measure of such centrality is defined as the distance between the middle points of the two objects and the 
major axis of the fitted ellipse, although those skilled in the art can determine other suitable measure of the deviation. 
[0040] Also referring to Fig. 1 0, the image is next examined for the existence of a mouth at an estimated position. The 
program searches for three or four parallel lines (edges) within a rectangular box that has a width equal to the distance 
between the eyes and at a predetermined distance from the pair of pixels being analyzed. This distance is 1 .2 times the 

10 distance between the candidate pairs, although those skilled in the art may determined other distance values or similar 
criteria. If the lines (edges) exist, the pair and its particular score are stored; if not, the pair is eliminated as possible 
candidates. 

[0041 ] The combinations are then examined for combined correlation of the two candidates. The combined correlation 
Is the sum of the conrplete correlation scores at the two candidate locations. If the combined correlation is above a pre- 
15 determined threshold, the pair and their score are stored; if not. the pair is eliminated as possible candidates. 

[0042] The most likely pair is the pair that has the highest cumulative scores S22. The final locations of the eyes are 
determined by this pair S24. The processes of S8-S24 are repeated for the next identified flesh region until the last flesh 
region is processed. 

[0043] The shape of scoring functions for each above-described figure of merit is illustrated in Fig. 9. With this scoring 
20 function, even if a combination fails the threshold of a particular figure of merit, it is assigned a large penalty but can still 
be retained for further consideration instead of being eliminated as described above. If a figure of merit x is satisfactory 
with respect to the threshold TO, the output of the scoring function, which is the input to the score accumulator, is close 
to a normalized maximum value of 1 .0. If x fails the threshold, a increasing amount of penalty is accessed depending 
on how badly x fails. The advantage of using such a scoring function is Improved robustness if a candidate pair barely 
25 fails the threshold but turns out to have the highest cumulative score. 

[0044] A computer program written in (whatever language) for performing the steps of the present invention is con- 
tained in Appendix A. 

[0045] Other features of the invention are included below. 

[0046] The computer program product wherein the first and second objects are first and second human eyes and 
30 wherein computing the proportion includes computing a distance between the first and second eyes for satisfying an 
anthropological prior model. 

[0047] The computer program product wherein the first and second objects are first and second human eyes and 
wherein computing the profile includes predetermining a model of an eye-to-eye profile and determining an actual pro- 
file from the image and computing a goodness-of-f it between the actual profile and the model profile. 
35 [0048] The computer program product wherein the first and second objects are first and second human eyes and 
wherein comprising computing the symmetry includes computing the symmetry between first and second halves of a 
face window determined by the eye locations. 

[0049] The computer program product wherein the first and second objects are first and second human eyes and 
wherein computing the evidence of mouth includes computing strength and orientation of edges within a mouth window 
40 determined by the eye locations. 

[0050] The computer program product wherein the first and second objects are first and second human eyes and 
wherein computing the centrality Includes computing a distance from a middle point between the first and second eyes 
to a major axis of the determined flesh region. 

[0051] The computer program product wherein the first and second objects are first and second eyes and wherein 
45 computing a combined correlation score includes summing up the individual correlation scores obtained from the 
matching obtained in step (g). 

[0052] The computer program product further comprising the step of normalizing the template for maximizing robust- 
ness of matching. 

[0053] The computer program product further comprising the step of creating an image block of a same size and ori- 
50 entation as the template. 

[0054] The computer program product further comprising the step of normalizing the image block for maximizing 
robustness of matching. 

[0055] The computer program product further comprising the step of extracting individual sub-zones in both the tem- 
plate and the image block, and performing cross-correlation for each corresponding sub-zone. 
55 [0056] The computer program product further comprising the step of computing an overall cross-correlation score for 
the image block if the cross-correlation score for each sub-zone exceeds a threshold. 
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PARTS LIST 



10057] 

5 10 computer system 

20 microprocessor-based unit 

30 display 

40 keyboard 

50 mouse 

10 52 selector 

55 CD-ROM 

56 printer 

57 compact disk 
61 floppy disk 

15 62 PC card 
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55 
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APPENDIX 



Pstgo 
1 



//****•** 

//* Name: 



eyeivs . cc 



//I E^ftion: semi-automatic eye location based on template matching 
// Feature: . using two eye search windows determined by initial inputs to limi 



t the scope of searching 



//* 
t 

//• 

ial positions) 
//* 



or using default eye search boxes if images are in standard foxma 
using input of the approximate two eye locations (user input init 



//* 
//* 
//* 
//* 



Author: 



. approximate scale is estimated 
. approximate tilt is estimated 

. no other prescreening (morphological operations) 
. one winner 
Jiebo Luo 



Copyright: <c) Eastman Kodak Company, 1997 



//♦ standard header files */ 
# include <stdio.h> 
# include <stdlib.h> 
# include <string.h> 
# include <math.h> 

//* customer header files ♦/ 

•include "image. h- // header file of C++ image processing library 

idlf^SrSS^flH??'" right-eye box for passport style pictures V 
•define BOXWIDTHl 0.3 
•define BOXLEFTOFFSETl 0.1 
•define BOXABOVEOFFSETl 0.1 

idlfSlf ^Se?Iot2'*^ ST^^^ passport style pictures */ 

•define B0XWIDTH2 0.1 

•define ABSBOXHBIGHT2 21 /• was 11 */ 
•define ABSB0XWIDTH2 21 /* was 11 */ 
•define DRXFTAWAY 10 /* was 5 ♦/ 

//* customer function definition */ 
•define nint(x) (x)>0 ? { (int) ( (x) +0 , 5) ) 
•define lmax(x, y) (x)>(y) ? (x) : (y) 
•define lmin(x, y) (x)<(y) ? (x) : (y) 
•define Isqr(x) ((x)«(x)) 
•define pvalid(x) ( (x >= 0 && x <= 255) ? 1 
•define TRUE 1 
•define FAI«SE 0 
•define SETPEAK 1 



( (int) ( (x)-0.5) ) /* alway wrapped */ 



0) 



definition for correlation methods 
ff define iwpemsttv i 



•define INTENSITY 1 

•define NORMTNTENSITY 2 

•define ORADIENT_ABS 3 

•define GRADIE»IT_POL 4 

•define LAPIACIAN 5 

•define PHASELONLY 6 n 

•define BIN«PHASE_ONLY 7 

M:£i;;rS?Nlc^r""'^ ^efimtion for scale/tUt variation V 



Phase-Only Filter */ 
Binary Phase-Only Filter 



9 
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ffdefine MAXSCALE 1.45 
fdefine STEPSCALE 0.15 
«de£ine MINTILT -10. 0 

#define MAXTILT +10.0 
•define STEPTILT 5.0 

//* ustomer definition of goodness for eye subregions •/ 
♦define FOZZYNEIGHBORHOOD 1 
tdefine PEAKNEIGHBORHOOD 2 

//* customer definition of eye-searching window */ 



#define SCOPECENTERl 
#define SCOPEWIDTHl 
«de£ine SCOPEHEIGHTl 
# define SCOPECENTER2 
tdefine SCOPEWIDTH2 
#define SC0PEHEIGHT2 



0.4 
0.2 



0.2 
0.1 



//* default search window geometrically centered */ 
//* 40% in total horizontally */ 
'//* 20% in total vertically */ 
//* manual search window (smaller) */ 
//* 20% in total horizontally */ 
//* 10% in total vertically */ 



//* external functions */ 

image morpho( image &a, int OPERATION, int ELEMENT_type, int ELEMENT_SIZE) ; 

// perform a morphological operation specified by OPERATION using an elemc 
nt specified by ELEMENT_TypE and 

// ELEMENT_SI2E 
image median (image &a, int KERNEL„TYPE, int KERNEL„SIZE) ; 

// perform median filtering using the specified kernel 
image texturemask ( iitiage &a) ; 

// generate a map of texture activities 
int skindetect t image Eca, image &sa> ; 

// skin detection 
image regioncrop t image &a, image &sa, int REGION); 

// crop a rectangular image block that includes the skin region 
float ellipsef it {image tak, int &minor, int fiemajor, float fcangle, float &aspect) 

// perform elliptic fitting and returns ellipticity measure 
image masking ( image bak, image &sa) ; 

/ / create searching mask map using a combination of morphological operatic 
ns, texture masking for 

// a skin region 
int range (int idim, int jdim, int i* int j); 

// check if a pixel is within the range of the image /searching-box 
image mcorrelator (image &a, image ^templ, image &mask, int maskthresh, int mflac 

// compute the zone-based cross-correlation with the template 
image scale(image fica, float scalex, float scaley) ; 

// perform scaling of an image using bilinear interpolation 
void update(image &aaa, image &aa, float s) ; 

/ / update the stzuiding correlation map with the current correlation map 
image eyepartition(int nrows, int ncols) ; 

// partition the eye into sub- zones 
int peak(iroage fca, float threshold, int wsize, int setpeakf lag) ; 

// locate the local peaks in the correlation map 
float mas)cvariance( image &a, image &mask, float mean) ; 

// calculate the variance of a masked region 
f array f median ( f array fix, int width); 

// perform median filtering of an f array 
f array f average ( f array ftx, int width, int norm); 

// perform running-average filtering of an f array 
void boxinask( image &mask,int xll,int xjl,int xi2,int xj2,int boxwidth, int boxhc 
ight,int leftflag,int rightf lag) ; 

// determine the eye boxes 
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image luin( image &a) ; 

// convert the image into luminance signals 
r^'f^.-^? fi^'^i^J^^^.t^"'' PEAKpeakstl, int pnum, float templwidth, int half heigh 
^4 1 S^f 5^5- float stopscale, float steptilt, int &kleft, int fckright, int 

xil, int xjl, int xi2, int xd2, int boxwidth, int boxheight); 
/ / form and evaluate pairing of two eye candidates 
void marking(image fca, int il, int jl, int ir, int jr) ; 
// mark the final eye locations 

^DO.DRAW^T^*"^^''^*^"^^^ ^"^^ ^""^ ^'^^ f array &profile, int 

ei 4. intensity profile along the eye-to-eye line 

float wprof lie (f array int tlefteye, int &noseridge, int &righteye) : 

LL^t^i^^^^ goodness of the eye-to-eye profile expected to be w-shaped 
float mouthdmage &a, int xl, int yl, int x2, int y2, int &xm. int tym. int GFUV 

// find evidences of mouth 

//* Default Threshold Values for Zone-based Matching */ - 

float GOODEYEBROW = 0.25; 

float GOODEYECORNER = 0.25; 

float GOODEYEIRIS = 0.25; 

float GOODPEAKCOR = 0.5;' 

float GO0DPEAKCOR2 := O.l'Oi 

class PEAK ( 
public: 

int peaknum; 

int i, j; /* position */ 
float s, t; /* scale and tilt */ 
float score; /* correlation score */ 
}; /* define an object for candidate peaks ♦/ 

image epart; 



float t_global, s_global; 

//* begin main() */ 

int main (int argc, char * 



*argv) 



/ / parse a command line 

// declaration of variables and objects 
image a, ea, ak; 

xilO = xil; xjlO = xjl; xi20 = xi2; xj20 = xj2; 
// back-up initial positions 

boxaccept = 1; 

a. read(imginname, shorty) ; 

// load the original image 

k = skindetect<a, sa) ; 

// skin detection routine returns the total number of skin regions 

for (region = 1; region <= k; k++) 

ak = regioncrop(a, sa, region); 

// crop the k-th flesh reigon 
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ellipticity « ellipsef it {ak, minor, major, angle, aspect); 
// elliptical fitting 

if ( ellipticity < MIN.ELLIPTICITY) continue; 

// resize the cropped image accoring to the estimated eye size */ 

if(ak,chan() == 3) ak = lum(ak) ; 

// convert the image to luminance signals 

mask = masking(ak, sa); 

cJ^eate searching mask map using a combination of morphological ©per 
ations, texture masking wj^^c* 

epart = eyepartition( (tempi, rows ()), (ten^l . cols ())) ; 
// partition the eye template 

// determine a range of sizes and orientations for fine matching 

// begin correlation for matching 

for (float s=maxscale; s>=minscale; s--stepscale) 

for( float t=mintilt; t<=maxtilt; t+=steptilt) 

aa = nicorrelator(aa, tempi, mask, maskthreshold,mf lag) ; 

printf (V* peak detection in the transformed correlation plane*/\n-)- 
22*='^^*EAKCOR*127, PEAKNEIGHBORHOOD, 0);/*half windowV 
prxntf ("Totally %d peaks found at s==%f, t=%f.\n-, pnum, s, t); 

printf (-/* peak detection in the inversely transformed correlation plar 

S^i^*.fi?SSl?i??' ?2°^P2AKCOR*127, PEAKNEIGHBORHOOD, 0);/*half window*/ 
printf CTotally %d peaks found at s=%f, t=%f.\n-, pnum, s, t); 

if(s == maxscale && t == raintilt) aaa « aa.copy(float )• 
else update(aaa, aa, s); /* copy/update the final correlation map */ 



V\n-) 



> 



^ \n* Checking Local Peaks in the S-T Conyposite Correlation Map *\nM - 
pnum = peak (aaa, GOODPEAKCOR*127, PEAKNEIGHBORHOOD, SETPEAK) , /*half Sindow*/ 
printf (-Totally %d peaks (255) found at all scales and tilts. wTpn^JiV; 

int kleftalOOO, kright=1000? 
int plen; 

f array profile (256), profileO, profilel, profile2; 

aO = ak.copy<byte_) ; 

s?eDti??^ ^k?;fr^5 JT^' 4 i ^^""^V ^^^^'^^S^' half height , halfwidth, stepscalc 
steptilt, kleft, kright, xil, xjl, xi2, xj2, boxwidth, boxheight) ; 



dl, xi2f'^xj2)*!^^'^^"^ locations: left e(%d,%d) right e(%d,%d) 



An", xil, > 



plen = get„ee„line(aO, xil, xjl, xi2, xj2, profile, 
marking(aO, xil, xjl, xi2, xj2) ; ^ ' 



1>J 
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prof ileO. alloc (plen) ; 
f or ( i=0 ; i<plen; i++) 

profileO.ptrtil « prof ile.ptr [i] ; //printf ( •%d\t" . (int)prof xle ti] ) ; ) 

printf ("Eye-to-eye prof ile length = %d\n", plen) ; 
profile.writeCprofile.dat"); 

printf < 'Prof ile Filtering: window width = %d\n" , plen/ 16* 2 +1) ; 

profilel = prof ileO. median ( plen/16*2+l ); 
profile2 = f average (prof ilel , plen/ 16*2 + 1 , 1) ; 

int lefteye, noseridge, righteye; 

goodwp = wpro file (profilel, tlefteye, tnoseridge, fcrighteye) ; 
printf ("goodness of the E2E profile = %f\n", wprof ile(prof ile2, lefteye, no 
seridge, righteye} ) ; 

mouth(aO, xil, xjl, xi2, xj2, im, jm, gf lag) ; /* detect and mar)c the mouth 

*/ 

> 

printf ( "compensate for subsampling factor = %d\n- , fac) ; 
xil = xil*fac; 
xjl - xjl*fac; 
xi2 = xi2*fac; 
xj2 = xj2*fac; 

aO .write (imgoutname, byte_) ; 

print ( "Processed one skin region. \n"); 

> 

printf (■ Process done. \n"); 

mask. deal loc () ; maskO.dealloc ( ) ; maskl . dealloc () ; 

profile . dealloc ( > ;prof ileO . dealloc ( ) ;prof ilel . dealloc ( ) ;prof ile2 . dealloc ( ) ; 
// free memory space 
> // end main{ ) • 
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Claims 

1 . A computer program product for locating first and second objects each having substantially the same physical char- 
acteristics, and the ratio of the distance between the first and second objects and the size of each object is sub- 
5 stantially invariant, the computer program product comprising: a computer readable storage medium having a 
computer program stored thereon for performing the steps of: 

(a) determining a potential flesh region in an image 

(b) determining valley regions in an image; 

10 (c) performing template matching for determining a plurality of locations that give a desirable match of the 

object relative to the template; and 

(d) performing verification for determining the liKetihood of a pair of potential object candidates at the locations 
determined in step (c). 

15 2. The computer program product as in claim 1 further comprising either or tx>th (e) determining an estimate size of 
both the first and second objects based on a shape and size of a determined flesh region or (f) determining an esti- 
mate orientation of both the first and second objects based on the shape and orientation of the determined flesh 
region. 

20 3. The computer program product as in claim 1 further comprising forming a mask image for searching for the first and 
second objects; the locations where a search is to be performed are determined by the valley regions within the 
determined flesh region. 

4. The computer program product as in claim 1 further comprising (g) reiteratively positioning a template on the loca- 
ls tions determined by the mask image for determining a location that gives a desirable match of the object relative to 

the template. 

5. The computer program product as in claim 4 further comprising the step of identifying desirable locations that give 
locally maximum cross-correlation score. 

30 

6. The computer program product as in claim 5 further comprising forming a pair of objects from locations with desir- 
able matching response to the template and within the determined flesh region; and determining desirability of 
each hypothesized pair. 

35 7. The computer program product as in claim 6 further comprising finding a best pair of locations from all possible 
combinations of identified locations of both the first and second objects. 

8. The computer program product as In daim 7, wherein the step of finding best pair of locations includes computing 
a plurality of figures of merit and determining a pair of locations that give a best combined figure of merit. 

40 

9. The computer program product as in claim 8, wherein computing a plurality of figures of merit includes computing 
individually or in combination an orientation, proportion, profile, symmetry, evidence of mouth, centrality, or com- 
bined correlation score. 

45 1 0. The computer program product as in claim 9, wherein the first and second objects are first and second human eyes 
and wherein computing the orientation includes measuring the difference between an orientation of a line connect- 
ing the first and second eyes, and an average orientation of the first and second eyes. 



50 
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FIG. 2a1 



FIG. 2a1 



FIG. 2a2 



FIG. 2a 
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-S2 



DETECT HUMAN FLESH REGIONS 
TO CREATE FLESH MAP 



I 



ESTIMATE THE SIZE/ORIENTATION OF EYES BASED 
ON SIZE/ORIENTATION OF FLESH REGIONS 



EXTRACT A SUB-IMAGE FOR EACH FLESH REGION 
RESIZE THE SUB-IMAGE SUCH THAT THE SIZE OF THE 
EYE IS CLOSE TO THE TEMPLATE 



"S6 



^S8 



DETECT VALLEY REGIONS TO CREATE VALLEY MAP 
FOR EACH EXTRACTED SUB-IMAGE 



CREATE A MASK MAP AS THE INTERSECTION OF 
THE FLESH MAP AND VALLEY MAP 



COMPUTE A ZONE-BASED CROSS-CORRELATION 
BETWEEN THE TEMPLATE AND THE IMAGE AT 
LOCATIONS IDENTIFIED BY THE MASK MAP 
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LOCATE LOCAL PEAKS WITH CROSS-CORRELATION 
SCORES ABOVE A PREDETERMINED THRESHOLD 
IN THE CORRELATION PLANE 



-S16 



STORE THE LOCATIONS OF THE PEAKS 
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CREATE PAIRS OF CANDIDATE EYES FOR EACH FLESH REGION 



I 



VERIFY THE FIGURES OF MERIT OF CANDIDATE EYE PAIRS AND 
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FIG.2b1 



FIG.2b2 



FIG.2b 



FIG. 2b1 
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EXTRACT A BLOCK FROM THE IMAGE WITH ITS 
CENTER AT THE CURRENT PIXEL AND ITS SIZE/ 
ORIENTATION THE SAME AS THE TEMPLATE 



_pS14c 



NORMALIZE THE EXTRACTED IMAGE BLOCK TO 
MAKE IT HAVE ZERO-MEAN & UNIT VARIANCE 



_j~S14d 



COMPUTE THE CROSS-CORRELATION BETWEEN 
EACH ZONE IN THE EXTRACTED IMAGE BLOCK 
AND ITS COUNTERPART IN THE TEMPLATE 



_j-S14e 



-IS THE CROSS-CORRELATION 
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PREDETERMINED THRESHOLD?' 



.NO 



YES 
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COMPUTE THE CROSS-CORRELATION BETWEEN THE 
ENTIRE EXTRACTED IMAGE BLOCK AND THE TEMPLATE 
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-IS THE CROSS-CORRELATION 
FOR THE ENTIRE EXTRACTED 
BLOCK ABOVE ITS PREDETERMINED 
THRESHOLD? 



S14i- 
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SIZE/ORIENTATION 
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REPEAT THE ZONE-BASED CORRELATION PROCESS 
FOR A PLURALITY OF SIZES/ORIENTATION AROUND 
THE ESTIMATES BY VARYING THE TEMPLATE 
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STORE THE HIGHEST CROSS-CORRELATION SCORE 
AND THE EXACT SIZE/ORIENTATION OF THE TEMPLATE 
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PIXELS EQUAL TO ZERO 
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IF TWO OR MORE SKIN REGIONS TOUCH 
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SKIN MAP 



/ 



FIG. 3 
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A PREFERRED SCORING FUNCTION f(x) 

FIG. 9 
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